ESOMIR: a curated database of biomarker genes and miRNAs associated with esophageal cancer

Abstract ‘Esophageal cancer’ (EC) is a highly aggressive and deadly complex disease. It comprises two types, esophageal adenocarcinoma (EAC) and esophageal squamous cell carcinoma (ESCC), with Barrett’s esophagus (BE) being the only known precursor. Recent research has revealed that microRNAs (miRNAs) play a crucial role in the development, prognosis and treatment of EC and are involved in various human diseases. Biological databases have become essential for cancer research as they provide information on genes, proteins, pathways and their interactions. These databases collect, store and manage large amounts of molecular data, which can be used to identify patterns, predict outcomes and generate hypotheses. However, no comprehensive database exists for EC and miRNA relationships. To address this gap, we developed a dynamic database named ‘ESOMIR (miRNA in esophageal cancer) (https://esomir.dqweilab-sjtu.com)’, which includes information about targeted genes and miRNAs associated with EC. The database uses analysis and prediction methods, including experimentally endorsed miRNA(s) information. ESOMIR is a user-friendly interface that allows easy access to EC-associated data by searching for miRNAs, target genes, sequences, chromosomal positions and associated signaling pathways. The search modules are designed to provide specific data access to users based on their requirements. Additionally, the database provides information about network interactions, signaling pathways and region information of chromosomes associated with the 3ʹuntranslated region (3ʹUTR) or 5ʹUTR and exon sites. Users can also access energy levels of specific miRNAs with targeted genes. A fuzzy term search is included in each module to enhance the ease of use for researchers. ESOMIR can be a valuable tool for researchers and clinicians to gain insight into EC, including identifying biomarkers and treatments for this aggressive tumor. Database URL https://esomir.dqweilab-sjtu.com


Introduction
Esophageal cancer (EC) is one of the deadliest cancers.In 2020, EC was responsible for 0.54 million deaths with 0.6 million new cases worldwide according to Global Cancer Statistics 2020: GLOBOCAN estimation (1).Histologically, in EC, there are two subtypes: esophageal squamous cell carcinoma (ESCC) and esophageal adenocarcinoma (EAC) (2).However, it is believed that Barrett's esophagus (BE) is the only precursor to EAC (3).Both subtypes seem very different due to distinctive primary risks, time trends, geographical patterns and genetic associations (4).The multistep process of EC development (ESCC and EAC) involves genetic occurrences that lead to important abnormalities in cell cycle control, neurotrophic interaction and adhesion molecules mechanisms (5).The occurrence of ESCC is predominant in Asian and sub-Sahara African regions, while EAC is predominant in Western countries (1).
MicroRNAs (miRNAs) are short [18-24 nucleotides (nt)] intrinsic non-coding RNAs that could control the gene transcription, i.e. pre-regulation of genes.miRNAs are a set of smallest single-stranded non-coding molecules among its other types, with 21-25 nt that regulate genes posttranscription in various species (6,7).miRNA plays a key role in regulating biological, physiological and cellular developmental processes, including apoptosis, cell proliferation, differentiation, invasion, metastasis, metabolism, tumorigenic progression, resisting cell death and stem cell maintenance (8,9).miRNA is the crucial player that can act as a tumor suppressor or oncogenes after being downregulated or upregulated (10).In a recent study, most miRNA gene mutations are typically found in genomic regions linked to cancer (11).Unfortunately, EC (subtypes ESCC and EAC) is diagnosed at an advanced or metastatic stage in most cases due to its asymptomatic nature, resulting in a poor prognosis and a high-death rate (12).
Recent advancement in tumorigenesis shows that molecular mechanism studies play a vital role in understanding insight into cancer (13).Molecular biomarkers are potential solutions for improving cancer invasion, metastasis and prognosis (14)(15)(16).miRNAs seem revolutionary discoveries for cancer treatment as miRNA acts as a regulator of gene expression.miRNA target identification has been a prominent area of research in recent years to gain insights into the signaling pathways, genes and mechanisms involved in cancer (17)(18)(19)(20)(21). miRNA prediction in vertebrates is complex as they have irregular homology in the target sequence on the seed region (22).There is substantial evidence that miRNAs play a critical role in the development of EC (23)(24)(25)(26).miRNAs have also been utilized to classify EC based on the stage or type (27).
MiRNAs identified through computational methods and subsequently validated through experimental methods have the highest potential to alleviate the underlying mechanisms involved in cancer hallmarks.Several miRNA discoveries associated with EC networks have shown promising results for prognosis, diagnosis and treatment.There is a growing demand for a user-friendly knowledge repository containing all relevant information, including miRNAs, genes, pathways, targets, networks and more (6,7).
Several databases are available to provide chromosomalbased information associated with targeted messenger RNA (mRNA), cancer pathways and lists of miRNAs.Some comprehensive databases, such as miRBase (28), miRGen v.3 (29) and MiRGator v 3.0 (30), provide annotated sequences, nomenclature and target information.Manually curated databases, including miRWalk (31), miRTarbase (32), miR-NAMap (33) and mirRecord (34), offer experimentally validated perspectives on associated miRNAs, their interactions and targets.Additionally, there are computationally targeted prediction-based databases, such as TargetScan (35), miRDB (36), PicTar (37) and MirRabel (38).Several disease-specific databases exist to fulfill the needs of miRNA repositories, such as OncomiRdbB (39), miRCancer (40), miRactDB (41), Phe-noMir (42), HDMM (43) and S-MED (44).However, these databases focus on different cancer types and provide detailed information about those diseases.Unfortunately, no detailed repository is available to provide insightful information about miRNAs involved in EC and its subtypes.Therefore, we developed and designed a comprehensive miRNA-based repository for researchers to access mRNA-miRNA targets, signaling pathways and targeted gene ontologies for EC.We created a web-based repository/database with a graphical user interface called ESOMIR (https://esomir.dqweilab-sjtu.com).Our database contains 877 human miRNAs and 133 mRNAs that act as miRNA targets in various signaling pathways, and we used MirWalk and TargetScan to cross-validate the putative targets for miRNAs.ESOMIR is a promising tool for the diagnostic paradigm in EC as it is a unique, comprehensive data repository freely accessible to everyone.

Quarrying and categorizing miRNAs and their associated targets
We mined all miRNAs from literature and various databases, specifically MirTarbase, MiRDB, miRBase, MicroCosm, miR2Disease (45) and miRSystem (46).We wrote a function in Python to exclude duplicated entries and used a script to acquire all mature miRNA FASTA sequences from miRBase.We extracted the targeted sites (3 ′ UTRs, 5 ′ UTRs and exons) of oncogenes from literature and knowledge systems such as COSMIC (Catalogue of Somatic Mutations in Cancer) (47), KEGG (Kyoto Encyclopedia of Genes and Genomes) (48), DDEC (Dragon Database of Genes Implicated in Esophageal Cancer) (49), CGED (Cancer Gene Expression Database) (50) and DisGenNet (51).We used the Ensembl-BioMart tool (52) to download the FASTA sequence of all oncogenes.Then, we screened 2654 miRNAs against EC oncogenes using miRanda (53), DIANA-microT-CDS (54) and RNA22 v2.We identified potential miRNAs and their target sites based on the standard scores set for an appropriate target.After the screening, we used the KEGG database to find signaling pathways against each putative target.We illustrated the complete workflow from data mining to processing, prediction and validation used in the study in Figure 1.

The structure of database and its contents
Database development is complex when providing a cognate database to users via a web-based interface with processing accuracy and speed.We developed and designed a database structure that allows users to access and extract information about miRNAs and their targets for human EC.We designed and developed ESOMIR as a cloud-based web portal to give access to in-depth analyzed data.The user interface of ESOMIR is user-friendly, which allows easy access to efficient information.We designed the structure of ESOMIR using PHP, MySQL, HTML, DataTables, Ajax, Bootstrap and jQuery.

Back-end development
We used the XAMPP server to design a cloud-based online portal, integrating parts such as Apache server, database (MySQL) and web interface.We developed the online portal using two scripting languages (PHP and HTML).We used MySQL and DataTables for data storage and associated operations.Additionally, we implemented logical scripts to search miRNAs, target genes or associated pathways to EC to develop the data extraction module (Figure 1).ESOMIR is directly linked to an original database, allowing users to access direct data.ESOMIR offers multiple particulars to access data associated with EC, including miRNAs, accession number, oncogene, target sites, description of genes and chromosome location.

Front-end development
The primary purpose of the ESOMIR interface is to provide intelligent interaction with a portal that gives the user a sense of acceptance.Ease and clarity are the foremost priorities to provide no-effort access to every group of users and enough technical assistance to do bioinformatics analysis with efficacy and effortlessness.We used Cascading Style Sheets (CSS), JavaScript, DataTables and Bootstrap to provide a modern look to this online portal.

Selection and validation of miRNAs as putative EC miRNAs
In order to categorize potential miRNAs associated with EC, we performed prediction and analysis against all collected oncogenes using miRanda, RNA22v2 and DIANA-microT-CDS.To mine putative miRNAs, we established specific parameters for each algorithm.In miRanda, we set the analysis parameter to −15 kcal/mol energy level (EL) with a threshold of 120.The miRanda algorithm has standard settings for reasonably predicting potential target sites.The free energy of RNA molecules, which is involved in unfolding the interaction sites to allow the pairing of nucleotides between miRNAs and mRNAs, is an essential property that facilitates their interactions.Therefore, lower overall free energy means higher miRNA-mRNA complex stability, indicating a higher possibility of essential interactions.
For RNA22v2, we selected entities with a folding energy of −12 with a P-value of 0.1 and more than three seeds.Additionally, we chose a threshold of 0.7 with a score of 90 using DIANA-microT-CDS, as this is the suggested threshold range for sensitive analysis for significant results.After all selection and analysis, we identified 2028 potential miR-NAs and their associated target genes and pathways for EC.We excluded data entries that did not meet the inclusion criteria and all microRNA entities that did not meet the mentioned criteria.To assess the significance of ECOMIR among other existing database systems, we plot a comparison graph in Figure 2A.We further validated the resulting miR-NAs through miRWALK and TargetScan, where only experimentally validated connections were considered putative miRNAs.

Target-based network enrichment analysis via KEGG pathways
EC is known to be regulated by different signaling pathways via microRNA, which have been shown to be significant players in the development of this disease (55)(56)(57)(58)(59)(60)(61)(62)(63)(64)(65).Using miRanda via KEGG, we identified some critical pathways in ESOMIR, including Wnt, NF-κB, TGFB, Apoptosis, Hippo, JAK-STAT and NOTCH, based on different ELs and a 120-threshold score.To ensure accuracy, we employed Mir-Walk and TargetScan for additional validation.In total, we identified 107 genes as miRNA targets across all signaling pathways in humans with EC.ESOMIR includes a higher number of genes as targets for EC-specific miRNAs compared to other cancer databases such as MirCosm, emiRiT (66), Mir2Disease, MirSystem, HMDD v2.0 (43) and miRCancer database (Figure 2B).Among all signaling pathways, mTOR, apoptosis and NF-κB recorded the highest number of targeted mRNA-miRNA associated with EC, while NOTCH signaling had the least number of targets (Figure 3).A single miRNA can target one or several genes, or several miRNAs can access a single target gene.Therefore, studying all signaling pathways associated with EC, either miRNA or mRNA, can lead to exciting discoveries.Further exploration of the interaction between miRNA-mRNA via signaling pathways could provide an in-depth understanding of possible miRNA-mRNA regulation.Considering the significant role of miRNAs in regulating mRNAs in disease development, we utilized data on miRNAs and their targets in each signaling pathway to recognize their interaction.To showcase the miRNA and target interaction, we proposed a binary matrix for each signaling pathway, with 1 indicating an mRNA targeted by miRNA and 0 indicating an mRNA not targeted.We then converted the miRNA-mRNA targets data frame into a matrix to extract error-free data using an open-source Bioconductor package of R.

Results
ESOMIR is a comprehensive database that extensively explores multiple paramount features related to EC information.The database contains a rich compilation of 125 unique genes and 877 miRNA targets implicated in EC.Accessible through a web-based portal, the database offers users access to various columns, each providing further details on the specific entity or entry under investigation.These columns include miRNA, accession, target site, EL, gene and associated information, chromosomal location and signaling pathway.Consequently, ESOMIR offers a holistic understanding of EC-related data.

Structure of the ESOMIR database modules
We intentionally designed the homepage of ESOMIR to exhibit simplicity and comprehensiveness regarding EC information.Additionally, we presented abstract information about the project.The webpage comprises a header, footer, responsive side navigation bar and body.A Toggle button is incorporated to facilitate the users in showing or hiding the navigation bar.The side navigation bar serves as the primary menu, enabling users to access different options promptly.We provide five pages, including the homepage, search page, statistics page, help page and contact us page.To provide a visual representation of the home page, we illustrate Figure 4.

Fuzzy search
On every page of ESOMIR, we provide a quick fuzzy termbased search to provide quick access to users.Users can give a minimum of three characters and get the results.Users can search in any context and retrieve any data record associated with EC.We implemented fuzzy search to ease the users who might need to search for the whole term or misspell it.This approach presents relevant results to the user.Figure 4 demonstrates the representation of fuzzy search.

Database gateways
ESOMIR's search page provides users with three methods to access and retrieve miRNA and their target information.The first approach involves an miRNA-based search, which offers users a selection of miRNA-related options, such as miRBase ID, accession or mature sequence.We make these options available through a drop-down menu and a search bar that users can utilize to filter or narrow their search  based on specific data.The second option is gene-centric, enabling users to refine their search by selecting the gene symbol, Ensembl ID or Entrez ID.Enter the corresponding gene in the search box for specific gene data.The third and final method ESOMIR's search page provides is interactionnetwork-based, which provides users with signaling path-ways.The interaction-network-based method enables users to explore the signaling pathways associated with genes, miRNA and interaction by selecting pathways from a dropdown menu.Notable pathways include Wnt, NF-κB, TGFB, Apoptosis, Hippo, JAK-STAT and NOTCH.Figure 5

Upshot of data approachability
The tabular format presents the search results, including columns for miRNA, mRNA and associated information.The first column displays the miRNA and its accession number, which serves as a redirecting link to www.mirbase.org.The table also shows the target sites, indicating the predicted target site of an miRNA gene, which can be 3 ′ UTR, 5 ′ UTR or exon.Additionally, there is a column that provides information on ELs in kcal/mol.The respective columns illustrate the associated pathways.Targeted gene information is available through a link that redirects to GenAtlas.
Furthermore, the table provides accessibility links to www.ensembl.organd www.biogps.org,allowing users to access the customizable layout through Ensembl ID and Entrez ID columns.The chromosome column provides the chromosomal location, with an accessible link to GeneCards, for users to obtain a broader vision of gene-centric information.The signaling pathways column maps the associated pathways of the specific miRNA, and users can also map the target gene onto the pathway in EC.The analysis includes vital pathways such as Wnt, NF-κB, TGFB, Apoptosis, Hippo, JAK-STAT and NOTCH.Users can access more detailed information about a specific pathway by being redirected to the KEGG pathway site.The resulting window provides pertinent information on crucial EC miRNAs and their well-recognized targets at various ELs.This information is deposited on the pathway information result page for users to access effortlessly.Targetbased pathway information is available as an image file, which users can view in Figure 6.The target pathway search module presents pathway results that differ from other modules.It presents information about miRNA-mRNA with 3 ′ UTR, 5 ′ UTR and exon in a network with associated nodes and connections.The target pathway search module presents pathway results that differ from other modules.It presents information about miRNA-mRNA with 3 ′ UTR, 5 ′ UTR and exon in a network with associated nodes and connections as shown in Figure 7.

The importance and abundance of data represented in ESOMIR
The abundance and importance of data in the ESOMIR database are by its large number of entries, which reflects the large number of studies that have been investigated to collect, validate and verify miRNA-target interactions associated with EC.The database currently includes >2000 miRNA-target interactions curated from various sources, including experimental studies, computational predictions and literature mining.The density of data can be viewed in Figure 8.

Discussion
Biological data have become a critical resource for studying complex diseases, with numerous databases available to target specific diseases and associated data.However, while miRNAs-based databases and their targets are essential, only some compile comprehensive knowledge about specific diseases and their targets.Currently, no available database exclusively provides information on the association of miRNAs with EC.To address this, we have developed ESOMIR, a dynamic and user-friendly database that provides comprehensive information on predicted and experimentally validated miRNAs with their putative targets.This database is anticipated to aid in the discovery of potential biomarkers for EC that possess diagnostic and therapeutic characteristics.Compared to other databases such as MicroCosm, miR2Disease, eMirit, TargetScan, HMDD v2.0 and miRCancer, ESOMIR contains the highest number of EC-specific miRNAs, with 877 humans (Figure 2A).This research also introduces novel miRNAs with their target information in different pathways.
The ESOMIR database has a user-friendly web interface that enables the query of miRNAs, target mRNAs and their interaction network.Additionally, it provides target-ontology information for all genes in oncogenic pathways, making it a widespread platform for studying miRNA and its alleged role in the development and progression of EC.The identification and functional prediction of miRNA targets is essential in assessing the role of miRNAs in regulating or deregulating common genes and pathways that contribute to cancer development (67).Despite the continuous evolution of miRNA target prediction algorithms, effective prediction of miRNA targets in animals remains more challenging than in plants (68).miRNAs regulate target genes by binding to their 3 ′ UTR, 5 ′ UTR or coding regions through the seed region (5 ′ ends of miRNA), making it a more difficult task for target prediction tools because of incomplete base pairing at the complementary site (69).To minimize false positives, we employed a well-known target prediction algorithm, miRanda, RNA22v2 and DIANA-microR-CDS at different ELs, with validation by mirWalk and TargetScan.Targets predicted at the lowest EL 30 represent a strong pairing of miRNA to its target, whereas EL 15 is a more relaxed criterion for finding novel miRNA targets.This approach can help researchers narrow down and screen the most promising putative miRNA targets in acute lymphoblastic leukemia for validation.We found that the ESOMIR database contains 123 miRNA targets for humans, representing the highest number of predicted putative targets for EC miRNAs compared to any other available database (Figure 2B).Down-regulation of a wide range of oncogenes through miRNAs involved in different signaling pathways can maintain the healthy background of normal cells.Therefore, classifying EC miRNA targets based on different signaling pathways and interpreting miRNA-mRNA interaction networks at 3 ′ UTR, 5 ′ UTR and exon regions of target genes could provide detailed information on the role of miRNAs in EC initiation and progression.Several studies demonstrate the interaction between miRNA and signaling pathways during all procedures/hallmarks in EC.For instance, Li et al. reported cross-interaction between miRNA and NOTCH signaling in EC, specifically in ESCC (70).Moreover, Gao et al. demonstrated that the overexpression of miR-31 is an oncogene in ESCC, repressing the expression of LATS2 by the Hippo pathway and activating epithelial-mesenchymal transition (71).
According to the ESOMIR database, the interaction between miRNA-mRNA networks in different signaling pathways is crucial in facilitating cross talks, for example, miR-34a-5p inhibits proliferation, migration, invasion and epithelial-mesenchymal transition in ESCC by targeting LEF1 and inactivating the Hippo-YAP1/TAZ Signaling Pathway (72).miRNA-373 also promotes the development of ESCC by targeting LATS2 and OXR1.Furthermore, miRNA-21 promotes cell proliferation, migration and resistance to apoptosis in EC through the PTEN/PI3K/AKT signaling pathway (58,73).The mTOR pathway and its interaction networks are identified as potential therapeutic targets in the treatment of EC (59).Researchers have identified the NF-κB signaling pathway as a transcriptional activator for a potential molecular marker for predicting and improving treatment efficacy in EC (74)(75)(76).The researchers have defined the roles of Wnt/βcatenin signaling pathway-related miRNAs in EC (64).They have found that down-regulation of miR-30a-3p/5p by activating the Wnt signaling pathway promotes cell proliferation in ESCC (77).

Conclusion
The ESOMIR database is a valuable resource for researchers and clinicians interested in studying miRNA-target interactions associated with EC.With >2000 meticulously curated miRNA-target interactions, this database provides a wealth of information on the regulatory networks of miRNAs and their target genes in various biological processes.The database uses analysis and prediction methods, including experimentally endorsed miRNA(s) information.ESOMIR is a userfriendly interface that allows easy access to EC-associated data by searching for miRNAs, target genes, sequences, chromosomal positions and associated signaling pathways.The search modules are designed to provide specific data access to users based on their requirements.Additionally, the database provides information about network interactions, signaling pathways and region information of chromosomes associated with the 3 ′ UTR or 5 ′ UTR and exon sites.Overall, the ESOMIR database is a promising platform for identifying potential therapeutic targets and developing personalized cancer treatments.In addition to the current features and information in the ESOMIR database, future updates could include additional data on miRNA-target interactions associated with other types of cancers or diseases.These updates would further expand the database's usefulness and make it a more comprehensive resource for researchers and clinicians.Additionally, incorporating machine learning algorithms could help to identify novel miRNA-target interactions and further enhance the predictive power of the database.

Figure 1 .
Figure 1.This diagram illustrates the ESOMIR modules, representing data analysis, prediction and mining procedures.Each stage in the process is clearly defined in this schematic, outlining the steps taken to reach the desired outcome.

Figure 2 .
Figure 2. (A) Comparison of all available miRNA targets associated with EC in ESOMIR and other available databases.The figure shows that ESOMIR has the highest number of miRNAs included in contrast to the other available databases.(B) Comparison of all available mRNAs associated with EC in ESOMIR and other available databases.

Figure 3 .
Figure 3. Signaling pathway targeted by selected putative miRNA associated with EC.The respective pathways along with their percentages are given and represented by different colors.

Figure 4 .
Figure 4. ESOMIR Homepage: a comprehensive miRNA database for EC.The home page shows the description of the database with different tabs that are included in the database.The Fuzzy Search Functionality implemented in ESOMIR is also given.
offers a comprehensive representation of ESOMIR's search page.

Figure 5 .
Figure 5. Demonstrative depiction of the Search page to access data using different modules.In the above figure from the database, it can be seen that the information can be retrieved through search by miRNA identifier search, search by target gene or miRNA or search for miRNA targeted pathway.

Figure 6 .
Figure 6.An illustration of the ESOMIR result page presented in a tabular format.The results show the miRNA, accession ID, target site, EL and other information including the gene name and pathways.

Figure 7 .
Figure 7.A depiction of the search results for the apoptosis signaling pathway, showcasing the target miRNA-mRNA network.

Figure 8 .
Figure 8.An illustration of >2000 mRNA-miRNA interactions that are exclusively linked to EC in the ESOMIR database.

Figure 9 .
Figure 9.A depiction of the miRNA-target interactions, along with their corresponding signaling pathways.

Figure 10 .
Figure 10.An illustration of top key mRNA and miRNA targets associated with EC. (A) The top 10% identified genes associated with EC. (B) The top 10% identified miRNAs associated with EC. (C) A network that visualizes the top connections and associations between mRNA and miRNAs, highlighting the most significant interactions among them.